Towards a Benchmark for ETL Workflows
نویسندگان
چکیده
Extraction–Transform–Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. Their practical importance is denoted by the fact that a plethora of ETL tools currently constitutes a multi-million dollars market. However, each one of them follows a different design and modeling technique and internal language. So far, the research community has not agreed upon the basic characteristics of ETL tools. Hence, there is a necessity for a unified way to assess ETL workflows. In this paper, we investigate the main characteristics and peculiarities of ETL processes and we propose a principled organization of test suites for the problem of experimenting with ETL scenarios.
منابع مشابه
Benchmarking ETL Workflows
Extraction–Transform–Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. A plethora of ETL tools is currently available constituting a multi-million dollar market. Each ETL tool uses its own technique for the design and implementation of an ETL workflow, making the task of assessing ETL tools extremely difficult. In this paper, we...
متن کاملLogical Optimization of ETL Workflows
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Usually, these processes must be completed in a certain time window; thus, it is necessary to optimize their execution time. In this paper, we delve into the logical optimization of ETL processes, mo...
متن کاملDetermining Essential Statistics for Cost Based Optimization of an ETL Workflow
Many of the ETL products in the market today provide tools for design of ETL workflows, with very little or no support for optimization of such workflows. Optimization of ETL workflows pose several new challenges compared to traditional query optimization in database systems. There have been many attempts both in the industry and the research community to support cost-based optimization techniq...
متن کاملSystematic ETL management - Experiences with high-level operators
Large organizations load much of their data into data warehouses for subsequent querying, analysis, and data mining. Extract-Transform-Load (ETL) workflows populate those data warehouses with data from various data sources by specifying and executing a set of transformations forming a directed acyclic transformation graph (DAG). Over time, hundreds of individual ETL workflows evolve as new sour...
متن کاملBlueprints for ETL workflows
Extract-Transform-Load (ETL) workflows are data centric workflows responsible for transferring, cleaning, and loading data from their respective sources to the warehouse. Previous research has identified graphbased techniques that construct the blueprints for the structure of such workflows. In this paper, we extend existing results by explicitly incorporating the internal semantics of each act...
متن کامل